Optical Character Recognition with Hugging Face Spaces

您所在的位置:网站首页 bert model hugging face Optical Character Recognition with Hugging Face Spaces

Optical Character Recognition with Hugging Face Spaces

2023-01-05 04:20| 来源: 网络整理| 查看: 265

What is Hugging Face Hub?

The HuggingFace Hub is a platform that allows developers to store and share code, as well as collaborate on Machine Learning projects. It hosts Git-based repositories, a type of version-controlled storage where developers can keep all of their project files. Developers can upload and access cutting-edge models for natural language processing, computer vision, and audio tasks on the Hub. It also provides a variety of datasets for various domains and modalities. Finally, developers can look into interactive apps that show off ML models directly in their browsers.To know more about Hugging Face Hub, check out the document.

What is Hugging Face space?

Spaces is a Hub platform that allows developers to quickly create and showcase ML demo apps. It is compatible with two Python Software Development Kits (SDK), namely Gradio and Streamlit, these are two tools that make it simple to create apps in a short period of time. Furthermore, users have the ability to create static Spaces, which are HTML, CSS, and JavaScript web pages hosted inside a Space. Visit the Spaces documentation if you want to find out more about Spaces and how to create your own. You can also upgrade your Space to run on GPU or other accelerated hardware.

Let’s, get a quick idea about Optical Character Recognition(OCR).

Optical Character Recognition

Optical Character Recognition (OCR) is a deep learning method for recognizing text from images such as scanned documents and photos. It analyses an image and extracts text from it using a convolutional neural network. After that, the extracted text is fed into an OCR engine that has been trained to recognize words and characters. The OCR engine’s output is then used to generate a text version of the original image. To automate data entry and document management processes, OCR is commonly used to extract text from images.

There are many libraries and techniques for OCR. Here we are going to implement 3 types of OCR techniques PaddleOCR, KerasOCR, and EasyOCR for text recognition. 

In this tutorial, we will see how we can host our OCR app on Hugging face spaces. For that, first you need to create a repository on Hugging Face space as shown below steps.

Steps for creating a repository on Hugging Face (🤗) space:

Step 1: Create an account on 🤗 Hub, and create a new space. Go to the Files and versions. You will see a generated README.md file for the project.

Step 2: Currently, we have set the below metadata as shown in the image in our README.md file. You can replace the metadata values as per your requirements and save them. You can check more about metadata configuration reference at 🤗 space configuration.

Step 3: Now you can create new files or upload the project files from your local system as shown below. You need to add all the required libraries in the requirement.txt file, 🤗 server will automatically download all the libraries. Another way to upload the entire project is using huggingface_hub, for this make sure you are login to 🤗 from your system. Then you can follow huggingface_hub steps to upload your local folder on 🤗 space.

Step 4: Now let’s start with the code, we will write our code in the app.py file.

Let’s start our code implementation

1. Import all libraries

import os import cv2 import json import easyocr import datasets import socket import requests import keras_ocr import numpy as np import gradio as gr import pandas as pd import tensorflow as tf import re as r from PIL import Image from datasets import Image from datetime import datetime from paddleocr import PaddleOCR from urllib.request import urlopen from huggingface_hub import Repository, upload_file

2. We have written OCR generation functions separately for all 3 methods.

Code for Paddle OCR:

""" Paddle OCR """ def ocr_with_paddle(img): finaltext = '' ocr = PaddleOCR(lang='en', use_angle_cls=True) # img_path = 'exp.jpeg' result = ocr.ocr(img) for i in range(len(result[0])): text = result[0][i][1][0] finaltext += ' '+ text return finaltext

Code for Keras OCR:

""" Keras OCR """ def ocr_with_keras(img): output_text = '' pipeline=keras_ocr.pipeline.Pipeline() images=[keras_ocr.tools.read(img)] predictions=pipeline.recognize(images) first=predictions[0] for text,box in first: output_text += ' '+ text return output_text

Code for Easy OCR:

""" easy OCR """ # gray scale image def get_grayscale(image): return cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # Thresholding or Binarization def thresholding(src): return cv2.threshold(src,127,255, cv2.THRESH_TOZERO)[1] def ocr_with_easy(img): gray_scale_image=get_grayscale(img) thresholding(gray_scale_image) cv2.imwrite('image.png',gray_scale_image) reader = easyocr.Reader(['th','en']) bounds = reader.readtext('image.png',paragraph="False",detail = 0) bounds = ''.join(bounds) return bounds

3. Created a common function for all OCR methods that will take input as images and return generated text from the input image.

""" Generate OCR """ def generate_ocr(Method,input_image): text_output = '' if (input_image).any(): print("Method___________________",Method) if Method == 'EasyOCR': text_output = ocr_with_easy(input_image) if Method == 'KerasOCR': text_output = ocr_with_keras(input_image) if Method == 'PaddleOCR': text_output = ocr_with_paddle(input_image) flag(Method,input_image,text_output,ip_address,location) return text_output else: raise gr.Error("Please upload an image!!!!")

4. After all these functions, let’s use the gradio app to integrate our code with the user interface.

Gradio

Gradio is a useful tool for developers because it allows them to quickly and easily build interactive user interfaces for their machine-learning models. This can be especially useful for demonstrating the capabilities of a model to others, or for gathering user feedback on a model’s performance. Additionally, because Gradio uses Jupyter notebooks, developers can easily share their work with others, making it a great tool for collaboration. If you want to learn more about the Gradio app, please follow this link.

This is the UI for our demo using Gradio app

Basically, We can launch Gradio Demo in 2 ways, using gr.blocks and gr.interface.

There are three main parameters in Gradio:1. Function: a process that handles the user interface’s primary function2. Input: the type of input component3. Output: the type of output component

The final section of the code involves launching the interface. It is made up of various components such as function, inputs, outputs, title, description, and more. This link contains all the interface components.

image = gr.Image(shape=(300, 300)) method = gr.Radio(["PaddleOCR","EasyOCR", "KerasOCR"],value="PaddleOCR",elem_id="radio_div") output = gr.Textbox(label="Output",elem_id="opbox") demo = gr.Interface( generate_ocr, [method,image], output, title="Optical Character Recognition", css=".gradio-container {background-color: #C0E1F2} #radio_div {background-color: #ADA5EC; font-size: 40px;} #btn {background-color: #94D68B; font-size: 20px;} #opbox {background-color: #ADA5EC;}", article="""

Feel free to give us your feedback and contact us at [email protected] And don't forget to check out more interesting NLP services we are offering.

Developed by : Pragnakalp Techlabs

""" ) demo.launch() Saving data and logs on the Hugging Face Hub datasets

After creating your application. If you want to log the user input and the results, then you can follow the below steps. Here, we have used Hugging Face datasets to store the logs.

Step 1: To save/store logs or data, create a new dataset on 🤗 Datasets. You can refer Datasets doc for detailed information.

Step 2: To make the connection with your dataset, follow the below code snippet.

HF_TOKEN = os.environ.get("HF_TOKEN") DATASET_NAME = "OCR-img-to-text" DATASET_REPO_URL = f"https://huggingface.co/datasets/pragnakalp/{DATASET_NAME}" HF_TOKEN = os.environ.get("HF_TOKEN") DATASET_REPO_ID = "pragnakalp/OCR-img-to-text" print("is none?", HF_TOKEN is None) REPOSITORY_DIR = "data" LOCAL_DIR = 'data_local' os.makedirs(LOCAL_DIR,exist_ok=True) repo = Repository( local_dir="ocr_data", clone_from=DATASET_REPO_URL, use_auth_token=HF_TOKEN ) repo.git_pull()

Here, HF_TOKEN is known as User access tokens of 🤗, which are the most commonly used method of authenticating an application or notebook to 🤗 services. Note: While saving your token, keep the role in “write” mode. After generating the access token, copy it and save it to your space’s setting → Repository secrets, keeping the name as “HF_TOKEN”.

DATASET_REPO_ID will be your path to the dataset.REPOSITORY_DIR will be your folder name to save data.

Step 3: Write a function for saving data.

""" Save generated details """ def dump_json(thing,file): with open(file,'w+',encoding="utf8") as f: json.dump(thing,f) def flag(Method,input_image,text_output,ip_address,location): try: print("saving data------------------------") adversarial_number = 0 adversarial_number = 0 if None else adversarial_number metadata_name = datetime.now().strftime('%Y-%m-%d %H-%M-%S') SAVE_FILE_DIR = os.path.join(LOCAL_DIR,metadata_name) os.makedirs(SAVE_FILE_DIR,exist_ok=True) image_output_filename = os.path.join(SAVE_FILE_DIR,'image.png') try: Image.fromarray(input_image).save(image_output_filename) except Exception: raise Exception(f"Had issues saving PIL image to file") # Write metadata.json to file json_file_path = os.path.join(SAVE_FILE_DIR,'metadata.jsonl') metadata= {'id':metadata_name,'method':Method, 'File_name':'image.png','generated_text':text_output, 'ip_address': ip_address,'loc': location} dump_json(metadata,json_file_path) # Simply upload the image file and metadata using the hub's upload_file # Upload the image repo_image_path = os.path.join(REPOSITORY_DIR,os.path.join (metadata_name,'image.png')) _ = upload_file(path_or_fileobj = image_output_filename, path_in_repo =repo_image_path, repo_id=DATASET_REPO_ID, repo_type='dataset', token=HF_TOKEN ) # Upload the metadata repo_json_path = os.path.join(REPOSITORY_DIR,os.path.join (metadata_name,'metadata.jsonl')) _ = upload_file(path_or_fileobj = json_file_path, path_in_repo =repo_json_path, repo_id= DATASET_REPO_ID, repo_type='dataset', token=HF_TOKEN ) adversarial_number+=1 repo.git_pull() return "*****Logs save successfully!!!!" except Exception as e: return "Error whils saving logs -->"+ str(e)

You can see the log dataset preview in the below image.



【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3